While watching a video with audible verbal content, such as a song or spoken dialogue, a user may hear the content unclearly, or even missed hearing the content entirely due to being away for a moment or not paying sufficient attention. If the user wants to hear the content again, the user can rewind the video to the position with the unclear verbal content to re-hear the verbal content, or try to search online for information to clarify the verbal content. Both of these methods are inefficient and/or ineffective. Rewinding the video to re-hear the verbal content lengthens the user's time expended to watch the video and may not cure the problem, as it may be intrinsic characteristics of the verbal content (e.g., an accent) that is making the verbal content unclear. Making the user search for information online distracts the user from the video itself, and may be ineffective, as information on the verbal content may be unavailable for searching by users.